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VIDEO SEGMENTATION USING both incorporated by reference in their entireties herein. In 

STATISTICAL PIXEL MODELING an efficient transmission or storage scheme, the scene model 

need be transmitted only once, while the foreground infor- 

CROSS- REFERENCE TO RELATED mation is transmitted for each frame. For example, in the 

APPLICATION 5 case of an observer (i.e., camera or the like, which is the 

source of the video) that undergoes only pan, tilt, roll, and 
This application is a continuation-in-part of U.S. appli-. . . 200m types 0 f motion; me scene model need be transmitted 

cation Ser. No. 09/815,385 now Pat. No. 6,625,310, filed on onl once te cmsQ the appearance of the scene model does 

Mar. 23, 2001 , commonly-assigned, and incorporated herein nQt cha from ^ tQ excep{ ta a weU .defined 

by reference in its entirety. 10 way ^ Qn ^ ob$Qryer which can be ^ 

FIELD OF THE INVENTION accounted for by transmitting motion parameters. Note that 

such techniques are also applicable in the case of other forms 

Ibe present invention relates to processing- of' video of motion, besides pan, tilt, roll, and zoom In IVS systems 

flames for use in video processing systems, for example, is me creatlon of movm 8 foreground and background 

intelligent video surveillance (IVS).systems that are used as ****** ahws me svstem to atte0 W t classification on the 

a part of or in conjunction with Closed Circuit Television moving objects of interest, even when the background pixels 

Systems (CCTV) that are utilized in security, surveillance may be undergoing apparent motion due to pan, tilt and 

and related homeland security and anti-terrorism systems, zoom motion of the camera. 

IVS systems that process surveillance video in retail estab- 20 To make automatic object-oriented video processing fea- 

lishments for the purposes of establishing in-store human sible, it is necessary to be able to distinguish the regions in 

behavior trends for market research purposes, IVS systems the video sequence that are moving or changing and to 

that monitor vehicular traffic to detect wrong-way traffic, separate (i.e., segment) them from the stationary background 

broken-down vehicles, accidents and road blockages, and reg i 0 ns. This segmentation must be performed in the pres- 

video compression systems. IVS systems are systems that 25 ence of apparent mo tion, for example, as would be induced 

further process video after video segmentation steps to b a pannm g tilting roUmgi and/or zooming observer (or 

perform object classification in which foreground objects due Qmer motion . relate d phenomena, including actual 

may be classified as a general class such as annual vehicle for ^ mQti ■ ar£ 

orothermovingbut-unclassified object, or may be classified fc > ^ . corresponding locations in the images 

m more specific classes as human small- or large-non- 30 fr ^ deiermined, as discussed above. After this 

human annual automobile, aircraft, boat, truck, tree, flag or ^ ^ > ^ ^ movin or changingi rela . 

water regun. In IVS systems, once such video segmentation ^ background, can be segmented from the 

andclassmcahonoc^ ti ^ m ^ ^ ions m 

to deternune how then positions, movements and behaviors ^ J ( ^ ^ J odf * md ^ 

relate to user defined virtual video tnpwires and virtual 35 foreground objects are identified for each frame, 

regions of interest (where a region of interest may bean && j 

entire field of view, or scene). User defined events that occur ft 15 not ^ ^ to identify and automatically 
will then be flagged as events of interest that will be distinguish between video objects that are moving fore- 
communicated to the security officer or professional on duty. md stationary background, particularly m the pres- 
Examples of such events include a human or a vehicle 40 ence of observer motion, as discussed arx>ve. Furthermore, 
crossing a virtual video tripwire, a person or vehicle loiter- t0 P rovide me maximum, degree of compression or the 
ing or entering a virtual region of interest or scene, or an maximum fineness or accuracy of other video processing 
object being left behind or taken away from a virtual region techniques, it is desirable to segment foreground objects as 
or scene, m particular, the present invention deals with ways as Possible; this enables, for example, the mainte- 
of segmenting video frames into their component parts using 45 nance of smoothness between successive video frames and 
statistical properties of regions comprising the video fiames. crispness within individual frames. Known techniques have 

proven, however, to be difficult to utilize and inaccurate for 

BACKGROUND OF THE INVENTION small foreground objects and have required excessive pro- 
cessing power and memory. It would, therefore, be desirable 

In object-based video compression, video segmentation 50 t0 *»vc a technique that permits accurate segmentation 

for detecting and tracking video objects, as well as in other between the foreground and background ^formation and 

types of object-oriented video processing, the input video is accurate, crisp representations of the foreground objects, 

separated into two streams. One stream contains the infor- wlthout me Nations of prior techniques, 
mation representing stationary background information, and 

the other stream contains information representing the mov- 55 SUMMARY OF THE INVENTION 
ing portions of the video, to be denoted as foreground 

information. The background information is represented as The present invention is directed to a method for seg- 

a background model, including a scene model; i.e., a com- mentation of video into foreground information and back- 

posite image composed from a series of related images, as, ground information, based on statistical properties of the 

for example, one would find in a sequence of video frames; 60 source video. More particularly, the method is based on 

the background model may also contain additional models creating and updating statistical information pertaining to a 

and modeling information. Scene models are generated by characteristic of regions of the video and the labeling of 

aligning images (for example, by matching points and/or those regions (i.e., as foreground or background) based on 

regions) and determining overlap among them; generation of the statistical information. For example, in one embodiment, 

scene models is discussed in further depth in commonly- 65 the regions are pixels, and the characteristic is chromatic 

assigned US. patent application Ser. No. 09/472,162, filed intensity. Many other possibilities exist, as will become 

Dec. 27, 1999, and Ser. No. 09/609,919, filed Jul. 3, 2000, apparent. In more particular embodiments, the invention is 
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directed to methods of using the inventive video segmenta- "Software" refers to prescribed rules to operate a com- 

tion methods to implement intelligent video surveillance puter. Examples of software include: software; code seg- 

sy stems. ments; instructions; computer programs; and programmed 

In embodiments of the invention, a background model is logic, 

developed containing at least two components. A first com- 5 A « computer systenV > refers l0 a system a com . 

ponent is the scene model, winch may be built and updated, t where tne computer comprises a computer-readable 

for example, as discussed in the aforementioned U.S.patent mMum embodying software to operate the computer. - 
applications. A second component is a background statistical 

model A "network" refers to a number of computers and asso : 

In a first embodiment, the inventive method comprises a io ciated devices that are connected by communication facili- 

two-pass process of video segmentation. The two passes of ties - A network involves permanent connections such as 

the embodiment comprise a first pass in which a background or temporary connections such as those made through 

statistical model is built and updated and a second pass in telephone or other communication links. Examples; of a; 

which regions in the frames are segmented. An embodiment ' network include: an internet, such as the Internet; an intra- 

of the first pass comprises steps of aligning each video frame 1 5 net; a local area network (LAN); a wide area network 

with a scene model and updating the background statistical (WAN); and..a. cprnbinati^^ as an internet 

model based on the aligned frame data. An embodiment of an( * m intranet. 

the second pass comprises, for each frame, steps of labeling "Video" refers to motion pictures represented in analog 

regions of the frame and performing spatial filtering. and/or digital form. Examples of video include video feeds 

In a second embodiment, the inventive method comprises 20 from CCTV systems in security, surveillance and anti- 

a one-pass process of video segmentation. The single pass terrorism applications, television, movies, image sequences 

comprises, for each frame in a frame sequence of a video from a camera or other observer, and computer-generated 

• stream, steps of aligning the frame with a scene model; nnage sequences. These can be obtained from, for example, 

building a background statistical model; labeling the regions a wired or wireless live feed, a storage device, a firewire 

of the frame, and performing spatial/temporal filtering. 25 interface, a video digitizer, a video sfreaming server, device 

In yet another embodiment, the inventive method com- or software component, a computer graphics engine, or a 

prises a modified version of the aforementioned one-pass network connection. 

process of video segmentation. This embodiment is similar "Video processing" refers to any manipulation of video, 

to the previous embodiment, except that the step of building including, for example, compression and editing,. 

a background statistical model is replaced with a step of 30 A "frame" refers to a particular image or other discrete 

building a background statistical model and a secondary unit within a video. " 

statistical model. 

Each of these embodiments may be embodied in the BRIEF DESCRIPTION OF THE DRAWINGS 
forms of a computer system running software executing 

their steps and a computer-readable medium containing 35 m mvention will now ^ described in further detail in 

software representing their steps. connection with the attached drawings, in which: 

DEFINITIONS *^ G ' * shows a flowchart corresponding to an implemen- 
tation of a first embodiment of the invention; 

^describing the invention, the following definitions are 4 ° .™£ 2a "*» sh f w ^ S ^f^ t0 ^ 

applicable throughout (including above). JfpE ? § 8 P * flowchart 

A "computer" refers to any apparatus that is capable of nro ' u a u a- i 

accepting a structured input, processing the structured input FIGS ' 3a ™ d 3b show flowcharts corresponding to imple- 

according to prescribed rules, and producing results of the 45 T^pi \r ?* s P atial/tem P° ral mterm 8 m flow " 

processing as output. Examples of a computer include: a chart 0 IG " 1; 

computer; a general purpose computer; a supercomputer; a 4 shows a flowchart corresponding to an implemen- 

mainframe; a super mini-computer; a mini-computer; a tetion of a second embodiment of the invention; 

workstation; a micro-computer; a server; an interactive FIG. 5 shows a flowchart corresponding to an implemen- 

television; a hybrid combination of a computer and an 50 tation of one of the steps in the flowchart of FIG. 4; 

interactive television; and application-specific hardware to FIGS. 6a and 6b together show a flowchart corresponding 

emulate a computer and/or software. A computer can have a to an implementation of another one of the steps in the 

single processor or multiple processors, which can operate flowchart of FIG 4 

in parallel and/or not in parallel. A computer also refers to FIG. 7 shows a flowchart corresponding to an implemen- 

too or more computers connected together via a network for 55 tation of a third embodiment of the invention; 

transmitting or receiving information between the comput- „ TOO 0 . 

ers. An example of such a computer includes a distributed FIGS ' *" and Sb t0 ^ r sh ° w a flowchart correspondmg 

computer system for processing information via computers J \* n im plementation of one of the steps in the flowchart of 

linked by a network. FIG 7; 

A "computer-readable medium" refers to any storage 60 FIG. 9 depicts an embodiment of the invention in the form 

device used for storing data accessible by a computer of software embodied on a computer-readable medium, 

Examples of a computer- readable medium include: a mag- wnich may te P art of a computer system; and 

netic hard disk; a floppy disk; an optical disk, like a FIG. 10 depicts a flowchart of a method of implementing 

CD-ROM or a DVD; a magnetic tape; a memory chip; and an intelligent video surveillance system according to an 

a carrier wave used to carry computer-readable electronic 65 embodiment of the invention. 

data, such as those used in transmitting and receiving e-mail Note that identical objects are labeled with the same 

or in accessing a network. reference numerals in all of the drawings that contain them. 
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DETAILED DESCRIPTION OF THE corresponding to the given pixel (or region). In the present 

INVENTION setting, then, such a mean would be computed for each pixel 

or region. 

As discussed above, the present invention is directed to E( J n - 0) S ives me general formula for a sample 

the segmentation of video streams into foreground informa- 5 mean * ix m W not alwa y s be OP* 1 ™ 1 to use this formula. In 

tion, which corresponds to moving objects, and background video P roce ssing applications, a pixel's sample value may 

information, which corresponds to the stationary portions of change. drastically when = an object moves.through the pixel . 

the video. The present invention may be embodied in a and chan S e - W»stically) ba <* t0 a value «wmd its previous 

number of ways, of which three specific ones are discussed value after ^ movin S object is no longer within that pixel, 

below. These embodiments are meant to be exemplary, 10 ^ order to ada^ess tliis tyrx; of consideration, me mvention 

rather than exclusive utilizes a weighted average, in which the prior values are 

The ensuing discussion refers to "pixels" and "chromatic - m ° re ^ J? ^ value " 1x1 P 3 *^ 

intensity;" however, the inventive method is not so limited. .* e .« Dowm « m ^ * used: 

Rather, the processing may involve any type of region 15 x„=w^ N _ x +w„x N , (2) 
(including regions comprising multiple pixels), not just a 

pixel, and may use any type of characteristic measured with where" W p is the weight of the past ' values and" W n is the 

respect to or related to such a region, not just chromatic weight assigned to the newest value. Additionally, x y repre- 

intensity. sents the weighted average taken over J samples, and x^ 

, ^ , • . ^ , A represents the K? h sample. W_ and W„ may be set to any pair 

1. First Embodiment-Two-Pass Segmentation 20 of values b^een zero and one such that their sum is one 

The first embodiment of the invention is depicted in FIG. and such that W„<W p , so as to guarantee that the past values 

1 and corresponds to a two-pass method of segmentation. As are more heavily weighted than the newest value. As an 

shown in FIG. 1, the method begins by obtaining a frame (or example, the inventors have successfully used WyO.9 and 

video) sequence from a video stream (Step 1). The frame W„=0.1. 

sequence preferably includes two or more frames of the Standard deviation, a, is determined as the square root of 

video stream. The frame sequence can be, for example, a the variance, o 2 , of the values under consideration. In 

portion of the video stream or the entire video stream. As a general, variance is determined by the following formula: 

portion of the video stream, the frame sequence can be, for _ 

example, one continuous sequence of frames of the video 3Q a^-fa 2 , (3) 

stream or two or more discontinuous sequences of frames of 

the video stream. As part of the alignment step, the scene where x 2 represents the average of x 2 ; thus, the standard 

model is also built and updated. deviation is given by 

After Step 1, in Step 2, it is determined whether or not all 

frames have yet been processed. If not, the next frame is 35 (4) 

taken and aligned with the underlying scene model of the ^ *u *• u. a * *• 

video stream (Step 3); such alignm^t is discussed above, ** mventlve method mes nBHUn * statlstlcs > ^ 

and more detailed discussions of alignment techniques may ^ ^ J' ^ 

be found, for example, in commonly-assigned U.S. patent a -i7^T~^ (4a) 

application Ser. No. 09/472,162, filed Dec. 27, 1999, and 40 

Ser. No. 09/609,919, filed Jul. 3, 2000, both incorporated by ■ , — . " . . , . . 

reference in their entireties herein, as discussed above, as ? h 5 e /" * as def ^ m ^ < 2) a}x)ve : m ? {x } " 1S 

well as in numerous other references. defin f 1 as * e ave °, f me ^ mred v t lues of the 

rrm ^ „ . , , , . . samples, through the N'" sample, and is given by 

Ihe mventive method is based on the use of statistical 1 _ _ 

modeling to determine whether a particular pixel should be 45 {^j^Wpi^y^+w^ 2 . (5) 

classified as being a foreground object or a part thereof or as 

being the background or a part thereof. Step 4 deals with the As in the case of the weighted average of the sample values, 
building and updating of a statistical model of the back- the weights are used to assure thai past values are more 
ground, using each frame aligned in Step 3. heavily weighted than the present value. 

The statistical model of the present invention comprises so Given this, Step 4 works to create and update the statis- 
first- and second-order statistics, hi the ensuing discussion, tical model b y computing the value of Eqn. (4a) for each 
mean and standard deviation will be used as such first- and P^ 1 * for each frame. In Step 4, the values for the pixels are 
second-order statistics; however, this is meant to be merely also store(i on a pixel-by-pixel basis (as opposed to how they 
exemplary of the statistics that may be used m received, i.e., on a frame-by- frame basis); that is, an 

In general, the mean of N samples, x, is computed by 55 array of values is compik^ 
taking the sum of the samples and dividing it by N, i.e., of frames. Note that in an alternative embodiment, Step 4 

fe ' ' ' only performs this storage of values. 

Following Step 4, the method returns to Step 2 to check 
N (1) whether or not all of the frames have been processed. If they 

^ Xi 60 have, then the method proceeds to Step 5, which commences 

T = ±1 the second pass of the embodiment. 

N ^ In Step 5, the statistical background model is finalized. 

This is done by using the stored values for each pixel and 
determining their mode, the mode being the value that 
where x,. is a particular sample corresponding to a given 65 occurs most often. This may be accomplished, for example, 
pixel (or region), which in the present case could be, for by taking a histogram of the stored values and selecting the 
example, the measured chromatic intensity of the \ th sample value for which the histogram has the highest value. The 
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mode of each pixel is then assigned as the value of the Returning to FIG. 1 , once all of the pixels of a frame have 

background statistical model for that pixel. been labeled, the process proceeds to Step 8, in which 

Followmg Step 5, the method proceeds to Step 6, which spatial/temporal filtering is performed. While shown as a 

determines whether or not all of the frames have been sequential step in FIG. 1, Step 8 may alternatively be 

processed yet. If not, then the method proceeds to Step 7, in 5 performed in parallel with Step 7. Details of Step 8 are 

which each pixel in the frame is labeled as being a fore- sh0 wn in the flowcharts of FIGS. 3a and 3b. 

ground (FG) pixel or a background (BG) pixel. Two alter- T ™- , e. o - -.u , u ■ ■ 

native embodiments of the workings of this step are shown > 8-cornmence^w.ura test as to whether 

in the flowcharts of FIGS, la and 2b. ° " 0 f V a11 ^pixels of the Samehave been processed (Step • 

HO. la depicts a two decision level method. In FIG. la, >o l^'E^ £ a S ^ V P ? ' ' 

ih ~ _. , i a iJi;„„ c*^ n u - *u ~r* * v • p /> for Fussing and proceeds to Step 82, where it is 

?JL££3J SfJ, Sw S , tC? 2 ' l b "V 15 determined whether or not the pixel is labeled as BG. If it is, 
bSTSJSfS Lt »hi ^Z^Z m a ^ ta «- then the process goes back to Step 81.. If not, then thepixel 
been processed. If not, then the method proceeds to Step .72 undereoes further urocessine in Steos 83 and 84 - 
to examine the next pixel. Step 72 detennines whether of not m f* 0f * w™w processing insteps 83,and W. 
the pixel matches the background statistical model, ie is . te P 83, nei g'*o rh ood filtering, is used to correct for 
whether the value of the pixel matches the mode for that misalignments when the images are aligned. If the current, 
pixel. This is performed by taking the absolute difference una . 8e IS sllghtly misali g ned WItn me growing background 
between the pixel value and the value of the background statistical mo <kl, then, particularly near strong edges, the 
statistical model for the pixel (i.e., the mode) and comparing m venlive segmentation procedure, using the background 
it with a threshold; that is, 20 stan sfccal model, will label pixels as foreground. Neighbor- 

hood filtering will correct for this. An embodiment of Step 
AHwW (6) 83 is depicted in the flowchart of FIG. 3b. 

Th~ «,„d,«w a u~a> a - „ P Kels > p «• surroundmg P m m the scene model, is selected 

The threshold 6 may be determined m many ways. For (S tep 832). Step 833 next detennines if all of the pixels in 

example, it may be taken to be a function of standard the neighborhood have been processed. If yes, Step 83 is 

aeviation (ot tne given pixel), a. In a particular exemplary complete, and the label of P, remains as it was; if not the 

r^ Cn H e=3a; manother emtodiment, *=Kg, where K 30 process prc>ceeds to Step 834, where the next neighborhood 

L?T y J? ^ A t V*** 1 ? 9 ^ * P kel P '~ is considered. Step 835 then tests to determine 

assigned a Redetermined value (again, for each pixel) or one whe ther or not P, matches F m . This matching test is accom- 

T^ ^ eUS f' • , , • plished by executing the labeling step (Step 7 or 7') in a 

It A=0, then the pixel value is considered to match the modified fashion, using P x . as the pixel under consideration 

background statistical model. In this case, the pixel is 35 and F m as the 4 'corresponding" background statistical model 

labeled as background (BG) in Step 73, and the algorithm point. If the labeling step returns a label of FG or DFG, there 

proceeds back to Step 71. Otherwise, if A>6, then the pixel is no match, whereas if it returns a label of BG, there is a 

value is considered not to match the background statistical match. If there is no match, the process loops back to Step 

model, and the pixel is labeled as foreground (FG) in Step 833; if there is a match, then this is an indication that P. 

74. Again, the algorithm then proceeds back to Step 71. If 40 might be mislabeled, and the process continues to Step 836! 

Step 71 determines that all of the pixels (in the frame) have In Step 836, a neighborhood, comprising the pixels, PV, 

been processed, then Step 7 is finished. surrounding P, in the frame, is selected, and an analogous 

FIG. 2b depicts a three decision level method, labeled 7*. process is performed. That is, in Step 833, it is determined 

In FIG. lb, the process once again begins with Step 71, a whether or not all of the pixels, P, in the neighborhood have 

step of determining whether or not all pixels have yet been 45 yet been considered. If yes, then Step 83 is complete, and the 

processed. If not, the process considers the next pixel to be label of P, remains as it was; if not, then the process proceeds 

processed and executes Step 72, the step of deterniining to Step 838, where the next neighbo±ood pixel, P',., is 

whether or not the pixel being processed matches the considered. Step 839 tests to determine if P m matches P,; 

background statistical model; this is done in the same way this is performed analogously to Step 833, with the P, under 

as in FIG. 2a. If yes, then the pixel is labeled as BG (Step 50 consideration being used as the pixel being considered and 

73), and the process loops back to Step 71. If not, then the P m as its "corresponding" background statistical model 

process proceeds to Step 75; this is where the process of point. If it does not, then the process loops back to Step 837; 

FIG. 2b is distinguished from that of FIG. 2a. if it does, then P, is relabeled as BG, and Step 83 is complete. 

In Step 75, the process determines whether or not the Retiirning to FIG. 3a, following Step 83, Step 84 is 

pixel under consideration is far from matching the back- 55 executed, in which morphological erosions and dilations are 

ground statistical model. This is accomplished via a thresh- performed. First, a predetermined number, n, of erosions are 

old test similar to Step 72, only in Step 75, 0 is given a larger performed to remove incorrectly labeled foreground. Note 

value. As m Step 72, 9 may be user-assigned or predeter- that pixels labeled DFG may not be eroded because they 

mined. In one embodiment, 6-Na, where N is a either a represent either a pixel that is almost certainly foreground, 

predetermined or user-set number, N>K. In another embodi- 60 This is followed by n dilations, which restore the pixels that 

ment, N-6. were correctly labeled as foreground but were eroded. 

If the result of Step 75 is that Ai6, then the pixel is Finally, a second predetermined number, m, of dilations are 
labeled as FG (Step 74). If not, then the pixel is labeled performed to fill in holes in foreground objects. The erosions 
definite foreground (DFG), in Step 76. In each case, the and dilations may be performed using conventional erosion 
process loops back to Step 71. Once Step 71 determines that 65 and dilation techniques, applied in accordance with user- 
all pixels in the frame have been processed, Step 7 is specified parameters, and modified, as discussed above, such 
complete. that pixels labeled DFG are not eroded. 
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In alternative embodiments, Step 84 may comprise filter- its standard deviation) is reasonably small. In an embodi- 

ing techniques other tlian or in addition to morphological ment of the present invention, Step 313 determines this by 

erosions and dilations, hi general. Step 84 may employ any comparing the standard deviation with a user-defined thresh- 

form or forms of spatial and/or temporal filtering. old parameter; if the standard deviation is less than this 

Returning to FIG. 1, following Step 8, the algorithm s threshold, then the statistical background model (for that 

returns to Step 6, to determine whether or not all frames have pixel) is determined to be stable. 

been processed. If yes, then the processing of.the frame As t0 me flow of St ep 31, in FIG: 5; if the background- 
sequence is complete, and the process ends (Step 9). statistical model is determined to be mature (Step 312), it is 
This two-pass embodiment has the advantage of relative determined whether or not the background statistical model 
simplicity, and it is an acceptable approach for applications to is stable (Step 313) if either of these tests (Steps 312 and 
not requiring immediate or low-latency processing. 313) fails, the process proceeds to Step 315, in which the 
Examples of such applications include off-line video com- background statistical model of the pixel being processed is 
pression and non-linear video editing and forensic process- updated using the current value of that pixel. Step 315 will 
ing of security and surveillance video. On the other hand, be explained further below. " 

many other applications such as video security and surveil- is ..... . , . . .. . . . . . . . ' . 

lance" in whichtaely event reporting is critical do have such . . .. * SS£ r SS^ L^f^ZllS 

requirements, and the embodtoetni to be discussed below ^ *» ' ble J» St fP s ^} ^ 313) ttte proces 

S2 , i;i -„H ' oHHtocc t . . proceeds to Step 314, where it is determined whether or not 

are tailored to address these requirements. ^ pixe , ^ ^ 

2. Second Embodiment — One-Pass Segmentation 2Q model. If yes, then the background statistical model is 

FIG. 4 depicts a flowchart of a one-pass segmentation updated using the current pixel value (Step 315); if no, then 

process, according to a second embodiment of the invention. the process loops back to Step 311 to determine if all pixels 

Comparing FIG. 4 with FIG. 1 (the first embodiment), the m the frame have been processed, 
second embodiment differs in that there is only a single pass Step 314 operates by determining whether or not the 

of processing for each frame sequence. This single pass, as ^ current pixel value is within some range of the mean value 

shown in Steps 2, 3, 31, 32, 8 in FIG. 4, incorporates the of the pixel, according to the current background statistical 

processes of the second pass (Steps 5-8 in FIG. 1) with the model. In one embodiment of the invention, the range is a 

first pass (Steps 2-4 in FIG. 1), albeit in a modified form, as user-defined range. In yet another embodiment, it is deter- 

will be discussed below. mined to be a user-defined number of standard deviations; 

As in the case of the first embodiment, the second 30 i.e., the pixel value, x, matches the background statistical 

embodiment (one-pass process), shown in FIG. 4, begins by model if 
obtaining a frame sequence (Step 1). As in the first embodi- 
ment, the process then performs a test to determine whether ^«r^?=^a, (7) 
or not all of the frames have yet been processed (Step 2). . „ . . . A , . ^ . . . 
Also as in the first embodiment, if the aiiswer isno, thenthe 35 e ^ J 8 ^ e user-defined number of standard deviations, 

next frame to be processed is aligned with the scene model ^^'A T ? ^JT\ 

(Step 3). As discussed above, the scene model component of ^f, 0 ™* ™ l P»f ^ me background statistical 

the background model is built and updated as part of Step 3, *° ; ^ P ^ S ^f1°T? ^ 4 'V 0 ^ ° 

so there is always at least a dete^ticaU^etenimied f 6 ft ^°™ d P«* f ed to 

value in the background model at each location. 40 develop md update me **<*&ow& statistical model. 

At this point, the process includes a step of building a 111 Ste P 315 > ^ background statistical model is updated, 
background statistical model (Step 31). This differs from 111 ^ embodiment, the background statistical model con- 
Step 4 of FIG. 1, and is depicted in further detail in FIG. 5. sists of mean md standard deviation of the values for 
The process begins with a step of determining whether or not eadl P ixel ( over se Q uence of frames). These are corn- 
all pixels in the frame being processed have been processed 45 P uted accordin 8 t0 Eqns. (2) and (4a) above. 
(Step 311). If not, then the process determines whether or not Following Step 315, the process loops back to Step 311, 
the background statistical model is "mature" (Step 312) and to determine if all pixels (in the current frame) have been 
"stable" (Step 313). processed. Once all of the pixels have been processed, the 

The reason for Steps 312 and 313 is that, initially, the Process proceeds to Step 316, where the background statis- 
statistical background model will not be sufficiently devel- 50 ^cal model is f^ 11 ^-. ^ realization consists of assign- 
oped to make accurate decisions as to the nature of pixels, ing to each pixel its current mean value and standard 
To overcome this; some number of frames should be pro- deviation (i.e., the result of processing all of the frames up 
cessed before pixels are labeled (i.e., the background statis- t0 ^ P° mt )- 

tical model should* be ''mature"); in one embodiment of the Note that it is possible for the background statistical 

present invention, this is a user-defined parameter. This may 55 model for a given pixel never to stabilize. This generally 

be implemented as a "look-ahead" procedure, in which a indicates that the particular pixel is not a background pixel 

limited number of frames are used to accumulate the back- in the sequence of frames, and there is, therefore, no need to 

ground statistical model prior to pixel labeling (Step 32 in assign it a value for the purposes of the background statis- 

FIG. 4). tical model. Noting that, as discussed above, a scene model 

While simply processing a user-defined number of frames 60 is also built and updated, there is always at least a deter- 

may suffice to provide a mature statistical model, stability is ministically-determmed value associated with each pixel in 

a second concern (Step 313), and it depends upon the the background model. 

standard deviation of the background statistical model. In Following Step 316, the process goes to Step 32, as shown 
particular, as will be discussed below, the statistical back- in FIG. 4, where the pixels in the frame are labeled accord- 
ground model includes a standard deviation for each pixel. 65 ing to their type (i.e., definite foreground, foreground or 
The statistical model (for a particular pixel) is defined as background). Step 32 is shown in further detail in the 
having become "stable" when its variance (or, equivalently, flowchart of FIGS. 6a and 6b. 
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The following concepts are embodied in the description of 
Step 32 to follow. Ideally, labeling would always be done by 
testing each pixel against its corresponding point in the 
background statistical model, but this is not always possible. 
If the background statistical model is not ready to use on the 5 
basis of number of frames processed (i.e., 4t mature"), then 
the process must fall back on testing against the correspond-. . . 
ing point in the scene model. If the background statistical 
model is ready to use but has not yet settled down (i.e., is not 
"stable"), this is a sign that the pixel is varying and should 10 
be labeled as being foreground. If the background statistical 
model has, for some reason (i.e., because it fails to match the 
scene model or because it has become unsettled again), " 
become unusable, the process must once again fall back on 
testing against the scene model. 15 

As shown in FIG. 6a, Step 32 begins with Step 321, where 
it is determined whether or not all pixels (in the current 
frame) have been processed. If yes, Step 32 is complete; if 
not, the next pixel is processed in Steps 322 et seq. 

Step 322 determines whether or not the background 20 
statistical model is mature. This is done in the same manner 
as in Step 312 of FIG. 5, discussed above. If not, the process 
proceeds to Step 323, where it is determined whether or not 
the pixel matches the background chromatic data of the 
corresponding point of the scene model. 25 

Step 323 is performed by carrying out a test to determine 
whether or not the given pixel falls within some range of the 
background chromatic data value. This is analogous to Step 
314 of FIG. 5, substituting the background chromatic data 
value for the statistical mean. The threshold may be deter- 30 
mined in a similar fashion (predetennined, user- determined, 
or the like). 

If Step 323 determines that the pixel does match the 
background chromatic data, then the pixel is labeled BG 
(following connector A) in Step 329 of FIG. 6b. From Step 35 
329, the process loops back (via connector D) to Step 321. 

If Step 323 determines that the pixel does not match the 
background chromatic data, then the pixel is labeled FG 
(Mowing connector B) in Step 3210 of FIG. 6b. From the 
Step 3210, the process loops back (via connector D) to Step 40 
321. 

If Step 322 determines that the background statistical 
model is mature, processing proceeds to Step 324, which 
determines whether or not the background statistical model 
is stable. Step 324 performs this task in the same manner as 45 
Step 313 of FIG. 5, discussed above. If not, the process 
proceeds to Step 325, where it is determined if the back- 
ground statistical model was ever stable (i.e., if it was once 
stable but is now unstable). If. yes, then the process branches 
to Step 323, and the process proceeds from there as 50 
described above. If no, the pixel is labeled DFG (following 
connector C) in Step 3211 of FIG. 6b, after which the 
process loops back (via connector D) to Step 321. 

If Step 324 determines that the background statistical 
model is stable, the process goes to Step 326. Step 326 tests 55 
whether the background statistical model matches the back- 
ground chromatic data. Similar to the previous matching 
tests above, this test takes an absolute difference between the 
value of the background statistical model (i.e., the mean) for 
the pixel and the background chromatic data (i.e., of the 60 
scene model) for the pixel. This absolute difference is then 
compared to some threshold value, as above (predetermined, 
user-determined, or the like). 

If Step 326 determines that there is not a match between 
the background statistical model and the background chro- 65 
matic data, the process branches to Step 323, where pro- 
cessing proceeds in the same fashion as described above. If 



Step 326, on the other hand, determines that there is a match, 
the process continues to Step 327. 

Step 327 determines whether or not the current pixel 
matches the background statistical model. This step is per- 
formed in the same manner as Step 314 of FIG. 5, discussed 
above. If the current pixel does match (which, as discussed 
: above, is. determined by comparing it to the mean, value 
corresponding to the current pixel), the pixel is labeled BG 
(following connector A) in Step 329 of FIG. 66,. and the . 
process then loops back (via connector D) to Step 321. If 
not, then further testing is performed in Step 328. 

Step 328 determines whether, given that the current pixel 
value does not reflect a BG' pixel; it reflects a FG pixel or a 
DFG pixel. This is done by determining ifthe pixel value is 
far from matching the background statistical model. As 
discussed above, a FG pixel is distinguished froma BG pixel 
(in Step 325) by detennining if its value differs from the * 
mean by more than a particular amount, for example, a 
number of standard deviations (see Eqn. (7)). Step 328 
applies the same test, but using a larger range. Again, the 
threshold may set as a predetermined parameter, as a com- 
puted parameter, or as a user-defined parameter, and it may 
be given in terms of a number of standard deviations from 
the mean, i.e., 



(8) 



where N is a number greater than K of Eqn. (7). Ifthe pixel 
value lies outside the range defined, for example, by Eqn. 
(8), it is labeled DFG (following connector C) in Step 3211 
of FIG. 66, and the process loops back (via connector D) to 
Step 321. If it lies within the range, the pixel is labeled FG 
(following connector B) in Step 3210 of FIG. 6b, and the 
-process proceeds (via connector D) to Step 321. 

After Step 32 is complete, the process proceeds to Step 8, 
as shown in FIG. 4, where spatial/temporal filtering is 
performed on the pixels in the frame. Step 8 is implemented, 
in this embodiment of the invention, in the same manner in 
which it is implemented for the two-pass embodiment, 
except that the pixel labeling algorithm of FIGS. 6a and 6b 
is used for Steps 833 and 837 of Step 83 (as opposed to the 
pixel labeling algorithms used in the two-pass embodiment). 
Following Step 8, the process loops back to Step 2, where, 
if all frames have been processed, the process ends. 

A single-pass approach, like the one present here, has the 
advantage of not requiring a second pass, thus, reducing the 
latency associated with the process. This is useful for 
applications in which high latencies would be detrimental, 
for example, video teleconferencing, webcasting, real-time 
gaming, and the like. 

3, Third Embodiment— Modified One-Pass Segmentation 

While the one-pass approach described above has a lower 
latency than the two-pass approach, it does have a disad- 
vantage in regard to the background statistical model. In 
particular, the cumulative statistical modeling approach used 
in the one-pass embodiment of the invention may stabilize 
on a non-representative statistical model for an element (i.e., 
pixel, region, etc.; that is, whatever size element is under 
consideration). If the values (e.g., chromatic values) of 
frame elements corresponding to a particular element of the 
video scene fundamentally change (i.e., something happens 
to change the video, for example, a parked car driving away, 
a moving car parking, the lighting changes, etc.), then the 
scene model element will no longer accurately represent the 
true scene. This can be addressed by utilizing a mechanism 
for dynamically updating the background statistical model 
so that at any given time it accurately represents the true 
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nature of tfie scene depicted in the video. Such a mechanism background images. It does lliis while still maintaining 

is depicted in the embodiment of the invention shown in improved latency time over the two-pass embodiment, and 

FIG. 7. at only a negligible decrease in processing speed compared 

In FIG. 7, Steps 1-3, 32, 8, and 9 are as described in the with the one-pass embodiment, 

one-pass embodiment above. The embodiment of FIG. 7 5. . ... . . „ , , „ . 

differs from that of FIG. 4 in that after a given frame is «n^i t Embodiments and Remarks 
aligned with the scene model (Step 3), the process executes ^ ul ? Uie above discussion considers two- level and three- 
Step 310, in which the background statistical model and, eveI lab f algorithms this embodiment is not 
simultaneously, a secondary background statistical model limited only to > these : cases. Indeed, it is contemplated that an 
are built. Step 310 is more fully described in connection with 10 arbltnu ? number of decision levels, corresponding to dif- 
FIGS %a and Sb ferent ranges (i.e., threshold values) may be used. In such a 

As'shown in FIG. 8a, Step 310 includes all of the steps ^e, fuzzy or soft-decision logic would be used to make 

shown in Step 31 in FIG. 5 (which are shown using the same decisi0ns m subsequent steps of the segmentation process. 

reference numerals), and it begins with a step of deterrnining The.above.discussion pnmanly .discusses. pixels and cbip- 

whetherornotaUpixelshaveyetbeenprocessed(Step311). is values ( which te RGB ' YUV, intensity, ete.); 

If not, the next pixel is processed by proceeding to Step 312. however, as discussed above, the invention is not limited to 

In Step 312, it is determined whether or not the background mese Quantities. Regions other than pixels may be used, and 

statistical model is mature. If not, the process branches to Quantities other than chromatic values may be used. 

Step 315, where the pixel is used to update the background As discussed above, the mvention, including all of the 

statistical model. Following Step 315, the process loops 20 embodiments discussed in the preceding sections, may be 

back to Step 311. embodied in the form of a computer system or in the form 

If Step 312 determines that the background statistical of a computer-readable medium containing software imple- 

model is mature, the process proceeds to Step 313, where it menting the invention. This is depicted m FIG. 9, which 

is determined whether or not the background statistical shows a P kn ^ for a computer system for the mvention. 

model is stable. If it is not, then, as in the case of a negative 25 ^ computer 91 includes a computer-readable medium 92 

determination in Step 312, the process branches to Step 315 embodying software for implementing the mvention and/or 

(and then loops back to Step 311). Otherwise, the process software to operate the computer 91 m accordance with the 

proceeds to Step 314. invention. Computer 91 receives a video stream and outputs 

In Step 314, it is determined whether or not the pixel segmented video, as shown. Alternatively, the segmented 

under consideration matches the background statistical 30 v^ 60 ^ father processed within the computer, 

model. If it does, the process proceeds with Step 315 (and A^ 0 as discussed above, the statistical pixel modeling 

then loops back to Step 311); otherwise, the process executes methods described above may be incorporated into a method 

the steps shown in FIG. 86, which build and update a of implementing an intelligent video surveillance system, 

secondary background statistical model. This secondary FIG - 10 depicts an embodiment of such a method. In 

background statistical model is built in parallel with the 35 particular, block 1001 represents the use of statistical pixel 

background statistical model, as reflected in FIG. Sb; uses modeling, e.g;, as described above. Once the statistical pixel 

the same procedures as are used to build and update the modeling has been completed, block 1002 uses the results to 

background statistical model; and represents the pixel values identify and classify objects. Block 1002 may use, for 

that do not match the background statistical model. example, statistical or template-oriented methods for per- 

Following a negative determination in Step 314, the 40 forming such identification and classification. In performing 

process then makes a determination as to whether or not the identification and classification, it is determined whether or 

secondary background statistical model is mature (Step not a g ive n object is an object of interest; for example, one 

3107). This determination is made in the same fashion as in may be interested in tracking the movements of people 

Step 313. If not, the process branches to Step 3109, where through an area under surveillance, which would make 

the secondary background statistical model is updated, using 45 people "objects of interest." In Block 1003, behaviors of 

the same procedures as for the background statistical model objects of interest are analyzed; for example, it may be 

(Step 315). From Step 3109, the process loops back to Step determined if a person has entered a restricted area. Finally, 

311 (in FIG. 8a). in Block 1004, if desired, various notifications may be sent 

If Step 3107 determines that the secondary background out or otner appropriate actions taken, 

statistical model is mature, the process proceeds to Step so The invention has been described in detail with respect to 

3108, which determines (using the same procedures as in preferred embodiments, and it will now be apparent from the 

Step 314) whether or not the secondary background statis- foregoing to those skilled in the art that changes and 

tical model is stable. If not, the process proceeds to Step modifications may be made without departing from the 

3109 (and from there to Step 311). If yes, then the process invention in its broader aspects. The invention, therefore, as 

branches to Step 31010, in which the background statistical 55 defined in the appended claims, is intended to cover all such 

model is replaced with the secondary background statistical changes and modifications as fall within the true spirit of the 

model, after which the process loops back to Step 311. invention. 
Additionally, concurrently with the replacement of the back- 

ground statistical model by the secondary background sta- We claim: t 

tistical model in Step 31010, the scene model data is 60 1. A method of implementing an intelligent video surveil- 

replaced with the mean value of the secondary statistical lance s y stem > comprising: 

model. At this point, the secondary background statistical obtaining a frame sequence from an mput video stream; 

model is reset to zero, and a new one will be built using executing a first-pass method for each frame of the frame 

subsequent data. sequence, the first-pass method comprising the steps of: 

This modified one-pass embodiment has the advantage of 65 aligning the frame with a scene model; and 

improved statistical accuracy over the one-pass embodi- updating a background statistical model; 

ment, and it solves the potential problem of changing finalizing the background statistical model; 
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executing a second-pass method for each frame of the labeling the regions of the frame; and 

frame sequence, the second-pass method comprising performing spatial/temporal filtering; 

the steps of: identifying and classifying objects based on the results of 

labeling each region of the frame; and the labeling and filtering; and 

performing spatial/temporal filtering of the regions of 5 analyzing behaviors of at least one object, 

the frame; ' 18. A computer-readable medium comprising software 

identifying and classifying objects using the labeled and implementing the method of claim- 17. 

filtered regions; and analyzing behaviors of at least one 19. An intelligent video surveillance system comprising a 

of the objects. computer system comprising: . . ..... _ 

2. A computer- readable medium comprising software 10 a computer; and 

implementing the method of claim 1. a computer-readable medium according to claim 18. 

3. An intelligent video surveillance system comprising a 20. The method of claim 17, wherein said analyzing 
computer system comprising: - behaviors of at least one of the "objects "comprises: 

a computer; and tracking at least one of the objects. 

a computer-readable medium according to claim 2. 15 21. The method of claim 17, further comprising: 

4. The method of claim 1, wherein said analyzing behav- . creating at least one rule to detect at least one specific, 
iors of at least one of the objects comprises: activity; 

tracking at least one of the objects. wherein said analyzing behaviors of at least one of the 

5. The method of claim 1, further comprising: objects includes applying the at least one rule, 
creating at least one rule to detect at least one specific 20 22. The method of claim 21, wherein said at least one rule 

activity; includes at least one virtual tripwire and determining when 

wherein said analyzing behaviors of at least one of the the at least one virtual tripwire is crossed, 

objects includes applying the at least one rule. 23. The method of claim 21, wherein said at least one rule 

6. The method of claim 5, wherein said at least one rule includes a definition of at least one area and the determining 
includes at least one virtual tripwire and determining when 25 at least one of when an object enters, when an object leaves, 
the at least one virtual tripwire is crossed. and when an object loiters in the at least one area. 

7. The method of claim 5, wherein said at least one rule 24. The method of claim 21, wherein said at least one rule 
includes a definition of at least one area and the determining includes at least one of deteimining when an object is added 
at least one of when an object enters, when an object leaves, to a scene and deterrxiining when an object is removed from 
and when an object loiters in the at least one area. 30 a scene. 

8. The method of claim 5, wherein said at least one rule . 25. A method of implementing an automated closed- 
includes at least one of detennining when an object is added circuit television (CCTV) surveillance system, comprising: 
to a scene and determining when an object is removed from providing CCTV equipment generating an input video 
a scene. stream; and 

9. A method of implementing an automated closed-circuit 35 implementing the method of claim 17. 

television (CCTV) surveillance system, comprising: 2 6. A method of implementing an automated security 

providing CCTV equipment generating an input video system, comprising the method of claim 17. 

• St , ream; and , , , . . 27. A method of implementing an automated anti-terror- 

miplementmg me method of claim 1. ism system> comp rising the method of claim 17. 

10. A method of implementing an automated security 40 28 A method of ^piementing an automated market 
system comprising the method of claim 1. research t uprising the method of claim 17. 

11 A method of implementing an automated anti-terror- 29 llie mM of claim 28j wherein said ^ ^ 

^ ■ , behaviors ofat least one ofthe objects comprises: 

reseLt J2£ rnLS?2^f k/TT^ tracking behaviors ofat least one ofthe ob^cts in at least 

research system, comprising the method of claim 1. 45 ™TT,, -i , ot . J 

13. The method of claim 12, wherein said analyzing , n f 1 a? • i , t ^.„ m 
behaviors ofat least one ofthe objects comprises: 30. A method of implementing an automated traffic mom- 
tracking behaviors ofat least one of the objects in at least t0 ™ g ^tern, «>^mg the method of claim 17. 

one retail location * * e memoc l °* claim 30, wherem said analyzing 

14. A method of implementing an automated traffic mom- so b ! haviors of at least one ofthe ****** comprises at least one 
toring system, comprising the method of claim 1. 0 , A . ^ 

15. The method of claim 14, wherein said analyzing Meeting wrong-way traffic; 
behaviors ofat least one ofthe objects comprises at least one detecting a broken-down vehicle; 
of: * detecting an accident; and 

detecting wrong-way traffic; 55 detecting a road blockage. 

detecting a broken-down vehicle; 32. A method of implementing a video compression 

detecting an accident; and system, comprising the method of claim 17. 

detecting a road blockage. 33. A method of implementing a video compression 

16. A method of implementing a video compression system, comprising the method of claim 17. 

system comprising the method of claim 1. <50 34. A method of implementing an intelligent video sur- 

17. A method of implementing an intelligent video sur- veillance system, comprising: 

veillance system, comprising: obtaining a frame sequence from a video stream; 

obtaining a frame sequence from a video stream; for each frame in the frame sequence, performing the 

for each frame in the frame sequence, performing the following steps: 
following steps: 65 aligning the frame with a scene model; 
aligning the frame with a scene model; building a background statistical model and a second- 
building a background statistical model; ary statistical model; 
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labeling the regions of the frame; and 

performing spatial/temporal filtering; 
identifying and classifying objects based on the results of 

the labeling and filtering; and 
analyzing behaviors of at least one object. 5 

35. A computer-readable medium comprising software 
implementing the method of claim 34. 

36. An intelligent video surveillance system comprising a 
computer system comprising: 

a computer; and 10 
a computer-readable medium according to claim 35. 

37. The method of claim 34, wherein said analyzing 
behaviors of at least one of the objects comprises: 

tracking at least one of the objects. - - •»•••» ■■■ 

38. The method of claim 34, further comprising: 15 
creating at least one rule to detect at least one specific 

activity; 

wherein said analyzing behaviors of at least one of the 
objects includes applying the at least one rule. 

39. The method of claim 38, wherein said at least one rule 20 
includes at least one virtual tripwire and determining when 
the at least one virtual tripwire is crossed. 

40 . The method of claim 38, wherein said at least one rule 
includes a definition of at least one area and the determining 

at least one of when an object enters, when an object leaves, is 
and when an object loiters in the at least one area. 

41 . The method of claim 38, wherein said at least one rule 
includes at least one of determining when an object is added 
to a scene and determining when an object is removed from 

a scene. 30 

42. A method of implementing an automated closed- 
circuit television (CCTV) surveillance system, comprising: 

providing CCTV equipment generating an input video 
stream; and 

implementing the method of claim 34. 35 

43. A method of implementing an automated security 
system, comprising the method of claim 34. 

44. A method of implementing an automated anti-terror- 
ism system, comprising the method of claim 34. 

45. A method of implementing an automated market 40 
research system, comprising the method of claim 34. 

46. The method of claim 45, wherein said analyzing 
behaviors of at least one of the objects comprises: 

tracking behaviors of at least one of the objects in at least 
one retail location. 4 5 

47. A method of implementing an automated traffic moni- 
toring system, comprising the method of claim 34. 

48. The method of claim 47, wherein said analyzing 
behaviors of at least one of the objects comprises at least one 
of: 
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detecting wrong-way traffic; 
detecting a broken-down vehicle; 
detecting an accident; and 
detecting a road blockage. 

49. A method of implementing a video compression 
system, comprising the method of claim 34. 

50. An apparatus for- intelligent video surveillance 
adapted to perform the method comprising: 

obtaining a frame sequence from an input video stream;- 
executing a first-pass method for each frame of the frame 

sequence, the first-pass method comprising the steps of: 

aligning the frame with a scene model; and 

. updating a background statistical model; 
finalizing the background statistical model; 
executing a second-pass method for each frame of the 

frame sequence, the second-pass method comprising 

the steps of: 

labeling each region of the frame; and 
performing spatial/temporal filtering of the regions of 
the frame; 

identifying and classifying objects using the labeled and 

filtered regions; and 
analyzing behaviors of at least one of the objects. 

51. The apparatus of claim 50 wherein the apparatus 
comprises application- specific hardware to emulate a com- 
puter and/or software adapted to perform said obtaining, 
said executing a first-path method, said finalizing, said 
executing a second-path method, said identifying, and said 
analyzing. 

52. An apparatus for intelligent -video surveillance - 
adapted to perform the method comprising: 

obtaining a frame sequence from a video stream; 
for each frame in the frame sequence, performing the 
following steps: 

aligning the frame with a scene model; 

building a background statistical model; 

labeling the regions of the frame; and 

performing spatial/temporal filtering; 
identifying and classifying objects based on the results of 

the labeling and filtering; and 
analyzing behaviors, of at least one object. 

53. The apparatus of claim 52 wherein the apparatus 
comprises application- specific hardware to emulate a com- 
puter and/or software adapted to perform said obtaining, 
said aligning, said building, said labeling, said filtering, said 
identifying, and said analyzing. 



